76 research outputs found

    HaploJuice : accurate haplotype assembly from a pool of sequences with known relative concentrations

    Get PDF
    Pooling techniques, where multiple sub-samples are mixed in a single sample, are widely used to take full advantage of high-throughput DNA sequencing. Recently, Ranjard et al. [1] proposed a pooling strategy without the use of barcodes. Three sub-samples were mixed in different known proportions (i.e. 62.5%, 25% and 12.5%), and a method was developed to use these proportions to reconstruct the three haplotypes effectively. HaploJuice provides an alternative haplotype reconstruction algorithm for Ranjard et al.’s pooling strategy. HaploJuice significantly increases the accuracy by first identifying the empirical proportions of the three mixed sub-samples and then assembling the haplotypes using a dynamic programming approach. HaploJuice was evaluated against five different assembly algorithms, Hmmfreq [1], ShoRAH [2], SAVAGE [3], PredictHaplo [4] and QuRe [5]. Using simulated and real data sets, HaploJuice reconstructed the true sequences with the highest coverage and the lowest error rate. HaploJuice achieves high accuracy in haplotype reconstruction, making Ranjard et al.’s pooling strategy more efficient, feasible, and applicable, with the benefit of reducing the sequencing cost

    Integration over song classification replicates: Song variant analysis in the hihi

    Get PDF
    Human expert analyses are commonly used in bioacoustic studies and can potentially limit the reproducibility of these results. In this paper, a machine learning method is presented to statistically classify avian vocalizations. Automated approaches were applied to isolate bird songs from long field recordings, assess song similarities, and classify songs into distinct variants. Because no positive controls were available to assess the true classification of variants, multiple replicates of automatic classification of song variants were analyzed to investigate clustering uncertainty. The automatic classifications were more similar to the expert classifications than expected by chance. Application of these methods demonstrated the presence of discrete song variants in an island population of the New Zealand hihi (Notiomystis cincta). The geographic patterns of song variation were then revealed by integrating over classification replicates. Because this automated approach considers variation in song variant classification, it reduces potential human bias and facilitates the reproducibility of the results

    Reassembling haplotypes in a mixture of pooled amplicons when the relative concentrations are known: A proof-of-concept study on the efficient design of nextgeneration sequencing strategies

    Get PDF
    Next-generation sequencing can be costly and labour intensive. Usually, the sequencing cost per sample is reduced by pooling amplified DNA = amplicons) derived from different individuals on the same sequencing lane. Barcodes unique to each amplicon permit short-read sequences to be assigned appropriately. However, the cost of the library preparation increases with the number of barcodes used. We propose an alternative to barcoding: by using different known proportions of individually-derived amplicons in a pooled sample, each is characterised a priori by an expected depth of coverage. We have developed a Hidden Markov Model that uses these expected proportions to reconstruct the input sequences. We apply this method to pools of mitochondrial DNA amplicons extracted from kangaroo meat, genus Macropus. Our experiments indicate that the sequence coverage can be efficiently used to index the short-reads and that we can reassemble the input haplotypes when secondary factors impacting the coverage are controlled. We therefore demonstrate that, by combining our approach with standard barcoding, the cost of the library preparation is reduced to a third.This research was supported by the Australian Research Council Discovery Project Grant #DP160103474

    Complete mitochondrial genome of the green-lipped mussel, Perna canaliculus (Mollusca: Mytiloidea), from long nanopore sequencing reads

    Get PDF
    We describe here the first complete genome assembly of the New Zealand green-lipped mussel, Perna canaliculus, mitochondrion. The assembly was performed de novo from a mix of long nanopore sequencing reads and short sequencing reads. The genome is 16,005 bp long. Comparison to other Mytiloidea mitochondrial genomes indicates important gene rearrangements in this family

    Cell Cycle Gene Networks Are Associated with Melanoma Prognosis

    Get PDF
    BACKGROUND: Our understanding of the molecular pathways that underlie melanoma remains incomplete. Although several published microarray studies of clinical melanomas have provided valuable information, we found only limited concordance between these studies. Therefore, we took an in vitro functional genomics approach to understand melanoma molecular pathways. METHODOLOGY/PRINCIPAL FINDINGS: Affymetrix microarray data were generated from A375 melanoma cells treated in vitro with siRNAs against 45 transcription factors and signaling molecules. Analysis of this data using unsupervised hierarchical clustering and Bayesian gene networks identified proliferation-association RNA clusters, which were co-ordinately expressed across the A375 cells and also across melanomas from patients. The abundance in metastatic melanomas of these cellular proliferation clusters and their putative upstream regulators was significantly associated with patient prognosis. An 8-gene classifier derived from gene network hub genes correctly classified the prognosis of 23/26 metastatic melanoma patients in a cross-validation study. Unlike the RNA clusters associated with cellular proliferation described above, co-ordinately expressed RNA clusters associated with immune response were clearly identified across melanoma tumours from patients but not across the siRNA-treated A375 cells, in which immune responses are not active. Three uncharacterised genes, which the gene networks predicted to be upstream of apoptosis- or cellular proliferation-associated RNAs, were found to significantly alter apoptosis and cell number when over-expressed in vitro. CONCLUSIONS/SIGNIFICANCE: This analysis identified co-expression of RNAs that encode functionally-related proteins, in particular, proliferation-associated RNA clusters that are linked to melanoma patient prognosis. Our analysis suggests that A375 cells in vitro may be valid models in which to study the gene expression modules that underlie some melanoma biological processes (e.g., proliferation) but not others (e.g., immune response). The gene expression modules identified here, and the RNAs predicted by Bayesian network inference to be upstream of these modules, are potential prognostic biomarkers and drug targets

    Data from: Modelling competition and dispersal in a statistical phylogeographic framework

    No full text
    Competition between organisms influences the processes governing the colonization of new habitats. As a consequence, species or populations arriving first at a suitable location may prevent secondary colonization. While adaptation to environmental variables (e.g., temperature, altitude, etc.) is essential, the presence or absence of certain species at a particular location often depends on whether or not competing species co-occur. For example, competition is thought to play an important role in structuring mammalian communities assembly. It can also explain spatial patterns of low genetic diversity following rapid colonization events or the “progression rule” displayed by phylogenies of species found on archipelagos. Despite the potential of competition to maintain populations in isolation, past quantitative analyses have largely ignored it because of the difficulty in designing adequate methods for assessing its impact. We present here a new model that integrates competition and dispersal into a Bayesian phylogeographic framework. Extensive simulations and analysis of real data show that our approach clearly outperforms the traditional Mantel test for detecting correlation between genetic and geographic distances. But most importantly, we demonstrate that competition can be detected with high sensitivity and specificity from the phylogenetic analysis of genetic variation in space

    Integration over song classification replicates: Song variant analysis in the hihi

    No full text
    Human expert analyses are commonly used in bioacoustic studies and can potentially limit the reproducibility of these results. In this paper, a machine learning method is presented to statistically classify avian vocalizations. Automated approaches were applied to isolate bird songs from long field recordings, assess song similarities, and classify songs into distinct variants. Because no positive controls were available to assess the true classification of variants, multiple replicates of automatic classification of song variants were analyzed to investigate clustering uncertainty. The automatic classifications were more similar to the expert classifications than expected by chance. Application of these methods demonstrated the presence of discrete song variants in an island population of the New Zealand hihi (Notiomystis cincta). The geographic patterns of song variation were then revealed by integrating over classification replicates. Because this automated approach considers variation in song variant classification, it reduces potential human bias and facilitates the reproducibility of the results

    A hidden Markov model approach to indicate Bryde's whale acoustics

    No full text
    Increasing sound in the ocean from human activity potentially threatens marine animals that use sound to communicate, detect prey, avoid predators and function within their ecosystem. The detection and classification of sound produced by marine animals, such as whales and fish, is an important component in noise mitigation strategies, while also providing valuable insights into their ecology. Traditionally, visual surveys are conducted to assess how these animals utilize a specific area, often underestimating the number of individuals as they don’t spend much time at the surface. Long-term passive acoustic monitoring efforts have become more prevalent to monitor such animals. The large datasets collected can be impractical to manually process, necessitating the development of automated detection methods, which often produce mixed results owing to the broad frequency range and variable duration of many biological sounds. Here we describe a novel approach for automated detection of underwater biophonic sounds employing hidden Markov models (HMM). Acoustic data was collected at a single listening station in Hauraki Gulf, from October 2014 to April 2016. HMM detection models were developed for Bryde’s whales (Balaenoptera edeni) that were used as a model organism because they are notoriously hard to study with traditional visual surveys and produce a characteristic call. Bryde’s whale calls also directly overlap the sounds of anthropogenic activity, in particular the sound of vessels transiting to the busiest port in New Zealand; therefore monitoring whale calls is of utmost importance when confronting increasing sound in the ocean. Vocalizations were detected with a sensitivity of 77% and false positive rate of 23%. Bryde’s whale vocalizations were detected on 11% of all recordings. Overall, there were significantly more detections during summer (n = 1716) than winter (n = 447), and significantly more during the day (n = 1991) compared to night (n = 1264). This study shows the feasibility of using HMMs on long-term acoustic datasets. The method has the potential to be used for a wide range of soniferous animals who, like the Bryde’s whale, also produce unique sounds. The detection method would be particularly useful for mitigation and management strategies of species that are difficult to detect using traditional visual methods.This research was funded by a Rutherford Discovery Fellowship from the Royal Society of New Zealand (RDF-UOA1302) to CAR, including a PhD scholarship to RLP
    • …
    corecore